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ABSTRACT 


Leishmania, a unicellular eukaryotic parasite, is a 
unique model for aneuploidy and cellular hetero- 
geneity, along with their potential role in adap- 
tation to environmental stresses. Somy variation 
within clonal populations was previously explored 
in a small subset of chromosomes using fluores- 
cence hybridization methods. This phenomenon, 
termed mosaic aneuploidy (MA), might have impor- 
tant evolutionary and functional implications but 
remains under-explored due to technological lim- 
itations. Here, we applied and validated a high 
throughput single-cell genome sequencing method 
to study for the first time the extent and dynam- 
ics of whole karyotype heterogeneity in two clonal 
populations of Leishmania promastigotes represent- 
ing different stages of MA evolution in vitro. We 
found that drastic changes in karyotypes quickly 
emerge in a population stemming from an al- 
most euploid founder cell. This possibly involves 
polyploidization/hybridization at an early stage of 
population expansion, followed by assorted ploidy 
reduction. During further stages of expansion, MA 
increases by moderate and gradual karyotypic alter- 
ations, affecting a defined subset of chromosomes. 
Our data provide the first complete characterization 
of MA in Leishmania and pave the way for further 
functional studies. 
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INTRODUCTION 


Aneuploidy, 1.e. an imbalance in the copy number of chro- 
mosomes in a cell, occurs in a wide range of organisms, 1n- 
cluding both non- and pathogenic unicellular eukaryotes, 
such as Saccharomyces cerevisiae, Candida albicans, Crypto- 
coccus neoformans and Leishmania spp., but also in different 
types of human cancer cells (1-6). Although generally con- 
sidered to be detrimental in multicellular organisms, aneu- 
ploidy can also be beneficial, in particular for unicellular or- 
ganisms facing drastic changes in the environment (7,8). In 
pathogens, aneuploidy facilitates rapid adaptation to envi- 
ronmental stresses through changes 1n gene dosage and may 
have an impact on both virulence and the development of 
drug resistance (7,9-14). 

Leishmania, a genus of digenetic protozoan parasites, 1s 
emerging as a unique model for aneuploidy (15). These 
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parasites are responsible for a spectrum of clinical forms 
of leishmaniasis worldwide and cause 300,000 new cases 
per year (16). They can be found in two forms during 
their life cycle: as an extracellular promastigote in the 
midgut of phlebotomine sand fly vectors and exclusively 
as intracellular amastigote inside mammalian host phago- 
cytic cells. Thus, Leishmania parasites are adapted to these 
two drastically different environments. From a molecular 
point of view, Leishmania, as other trypanosomatids, is 
unique in the Eukaryota domain (17). This includes the ge- 
nomic organization in long polycistronic units, the near ab- 
sence of transcription initiation regulation by RNA poly- 
merase II promoters, and a remarkable genomic plasticity 
(14,18). The Leishmania genome is generally considered to 
be diploid, although all Leishmania genomes analyzed hith- 
erto display aneuploidy affecting at least one chromosome, 
1.e., a polysomy in chr31. Moreover, high levels of ‘aver- 
age’ aneuploidy (average will be used throughout this pa- 
per for features derived from bulk analyses of population 
of cells) affecting other chromosomes are commonly found 
by bulk genome sequencing of in vitro cultured promastig- 
otes of all Leishmania species tested so far (1,2). This aver- 
age aneuploidy is highly dynamic and changes when culti- 
vated parasite populations are exposed to different environ- 
ments such as the vector, the vertebrate host or in response 
to drug pressure (19-21). In fact, changes in average aneu- 
ploidy pattern and not variation in nucleotide sequence are 
the first genomic modifications observed at populational 
level during the course of experimental selection of drug 
resistance (21,22). Given that these alterations in average 
somies are reflected in the average amount of corresponding 
transcripts (19,23), and to a certain degree, of proteins (24), 
it has been proposed that aneuploidy allows Leishmania to 
adapt by means of rapid changes in gene dosage. 

Leishmania parasites exhibit a remarkable cellular het- 
erogeneity in the form of mosaic aneuploidy, where indi- 
vidual daughter cells originating from a single parent (1.e., 
a clonal population) may display distinct somies (3,25). 
The full extent of mosaic aneuploidy in Leishmania and its 
dynamics during adaptation to new environment remains 
largely unexplored due to technological limitations. The 
only estimation of karyotype heterogeneity was based on 
the FISH studies of a small set of chromosomes, where 
it was speculated that thousands of karyotypes may co- 
exist in a clonal population of Leishmania promastigotes 
(3). Mosaicism was proposed to provide a source of func- 
tional diversity within a population of Leishmania cells, 
through gene dosage, but also through changes in het- 
erozygosity (23,26). This diversity of karyotypes would pro- 
vide an adaptive potential to unpredictable environmen- 
tal changes during the parasite’s life cycle or drug pressure 
caused by patient treatment (23,26). 

FISH-based pioneer studies should be complemented 
and refined by single cell approaches of whole genome se- 
quencing. In a previous study, we made a first step in that 
direction by combining FACS-based sorting of single Leish- 
mania cells with whole genome amplification (WGA) and 
whole genomic sequencing (WGS). In that pilot study, we 
evaluated different WGA and bioinformatic methods and 
were able to successfully call somy of all chromosomes in 28 
single Leishmania cells (27). Here, we applied and validated 


for the first time a high throughput, droplet-based platform 
for single cell genome sequencing (SCGS) of thousands of 
individual Leishmania promastigotes. This allowed a first 
assessment of the degree and the dynamics of the evolution 
of mosaic aneuploidy in two clonal populations in vitro rep- 
resenting different stages of expansion in culture conditions. 
Based on our study, we propose that the early stages of ex- 
pansion are characterized by rapid and drastic changes in 
karyotypes, allowing initial establishment of highly aneu- 
ploid cells in a population of almost euploid parasites. In the 
next steps, the existing highly aneuploid karyotypes further 
evolve through gradual and moderate changes 1n somies re- 
sulting in a population of aneuploid cells displaying closely 
related karyotypes. Our findings strongly support the hy- 
pothesis that mosaic aneuploidy is a constitutive feature of 
Leishmania parasites, representing a unique source of func- 
tional diversity. 


MATERIALS AND METHODS 
Parasites 


In the present paper, we use the terms population, strain 
and clone, following the nomenclature of salivarian try- 
panosomes (28). Accordingly: (1) a population is a group of 
Leishmania cells present at a given time in a given culture or 
host; (11) a strain is a population derived by serial passage 
in vitro from a primary isolate (in our case, from patient 
samples) without any implication of homogeneity but with 
some degree of characterization (in our case bulk genome 
sequencing); (111) a clone is derived from a strain and is a 
population of cells derived from a single individual pre- 
sumably by binary fission. L. donovani promastigotes were 
maintained at 26°C in HOMEM medium (Gibco, Ther- 
moFisher) supplemented with 20% Fetal Bovine Serum, 
with regular passages done every 7 days at 1/25 dilutions. 
The clones BPK282 cl4 and BPK081 cl8 were derived from 
two strains adapted to culture: MHOM/NP/02/BPK282/0 
and MHOM/NP/02/BPK081/0 (29). These clones were 
submitted to SCGS at 21 (~126 generations) and 7 passages 
(~56 generations) after cloning respectively (Supplemen- 
tary Figure Sl). Four strains were mixed to create an 
artificial “super-mosaic’ population of cells (further called 
super-mosaic): BPK475 (MHOM/NP/09/BPK475/9), 
BPK498 (MHOM/NP/09/BPK498 /0), BPK506 
(MHOM/NP/09/BPK506/0) and HU3 
(MHOM/ET/67/HU3). They were kept in vitro for 
several passages after isolation from patients (respectively 
41, 60, 47 and >24) and mixed at equivalent ratio just 
before preparation for SCGS. 


Single-cell suspensions preparation and sequencing 


Promastigotes at early stationary phase (day 5) were har- 
vested by centrifugation at 1000 rcf for 5 min, washed twice 
with PBS I x (calcium and magnesium-free) + 0.04% BSA, 
diluted to 5 x 10° parasites/mL and passed through a 5 
wm strainer to remove clumps of cells. After straining, vol- 
ume was adjusted with PBS |x + 0.04% BSA to achieve 
a final concentration of 3 x 10° parasites/ml. The absence 
of remaining cell doublets or clumps in the cell suspen- 
sion was confirmed by microscopy. Cell viability was es- 
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timated by flow cytometry (BD FACSverse™) using the 
NucRed™ Dead 647 probe (Life technologies™) following 
the recommendations of the manufacturer and in all sam- 
ples was estimated as higher than 95%. SCGS was per- 
formed using the Chromium™ single-cell CNV solution 
(scCNV) from 10X Genomics™. To target an average of 
2000 sequenced cells per sample, 4.2 wl of the cell suspen- 
sions were used as input, and cell encapsulation, barcoding, 
whole genome amplification and library preparation were 
performed following manufacturer’s recommendations. Se- 
quencing of the libraries was done with an Illumina No- 
vaSeq' SP platform with 2 x 150 bp reads. 


Single-cell somy estimation 


Details about the bioinformatic analysis for somy values de- 
termination are provided in the supplementary material. In 
summary, sequence reads were associated to each sequenced 
cell based on their barcodes and mapped to a customized 
version of the reference L. donovani genome LdBPKv?2 (19) 
using the Cell Ranger DNA™ software (10x Genomics). 
The matrix generated by the software with the number of 
mapped reads per 20 kb bins was used as input to a cus- 
tom script written in R (30). In this script, bins with out- 
lier values were excluded, and the mean normalized read 
depth of each chromosome was calculated for each cell. 
Cells displaying a high intra-chromosomal variation were 
removed from downstream analysis. In order to establish 
the baseline ploidy of each cell and define the somies of a 
cell, the mean normalized depth values of the cell’s chro- 
mosomes were multiplied by the cell scale factor, defined 
for each cell as the lowest number between 1.8 and 5 that 
leads to the shortest distances to integers when the chro- 
mosomes’ mean normalized depth values are multiplied by 
it. The scaled mean normalized depth values are referred 
here as ‘raw somies’. To convert the raw somies (continu- 
ous) into integer copy numbers (discrete), a univariate gaus- 
sian mixture-model was built for each chromosome by an 
expectation-maximization algorithm based on the distribu- 
tion of the raw somy values between all cells of the same 
sample using the Mixtools package (31). For each possible 
integer somy, a gaussian mixture-model was generated and 
each raw somy value was assigned to the rounded mean of 
the gaussian to which it has higher probability of belonging 
to. 


Karyotype identification and network analysis 


A karyotype was defined as the combination of integer 
somies of all chromosomes in a cell. Karyotypes were nu- 
merically named according to their frequency in the se- 
quenced population. To generate the network representing 
the dissimilarities between the karyotypes, a pairwise dis- 
tance matrix was built based on the number of different 
chromosomes between all karyotypes in a sample, and used 
to create a randomized minimum spanning tree with 100 
randomizations, using the Pegas R package (32,33). The 
network visualization was made with the visNetwork pack- 
age (34). 
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Doublet detection 


The relative fraction of doublets within the super mosaic 
population has been estimated based on the high number of 
single nucleotide polymorphisms (SNPs) found in the HU3 
strain when compared to the L. donovani reference genome. 
The three other strains in the super mosaic only show a lim- 
ited number of SNPs in contrast. Potential doublets were 
identified by looking for mixture of both SNP profiles (HU3 
and non-HU3) in assumed single-cell data. This approach 
was applied using an in-house developed algorithm and the 
Demuxlet algorithm (35), both approaches leading to iden- 
tical results (see Supplementary Text). 


DNA probes and fluorescence in situ hybridization 


DNA probes were either cosmid (L549 specific of chromo- 
some |) or BAC (LB00822 and LB00273 for chromosomes 
5 and 22, respectively) clones that were kindly provided 
by prof. Peter Myler (Center for Infectious Disease Re- 
search, formerly Seattle Biomedical Research Institute) and 
Christiane Hertz-Fowler (Sanger Centre). DNA was pre- 
pared using Qiagen Large-Construct Kit and labelled with 
tetramethyl-rhodamine-5-dUTP (Roche Applied Sciences) 
by using the Nick Translation Mix (Roche Applied Sci- 
ences) according to manufacturer instructions. Leishmania 
cells were fixed in 4% paraformaldehyde then air-dried on 
microscope immunofluorescence slides, dehydrated in serial 
ethanol baths (50—100%) and incubated in NP40 0.1% for 5 
min at RT. Around 100 ng of labelled DNA probe was di- 
luted in hybridization solution containing 50% formamide, 
10% dextran sulfate, 2 x SSPE, 250 wg.mL~! salmon sperm 
DNA. Slides were hybridized with a heat-denatured DNA 
probe under a sealed rubber frame at 94°C for 2 min and 
then overnight at 37°C and sequentially washed in 50% 
formamide/2x SSC at 37°C for 30 min, 2x SSC at 50°C 
for 10 min, 2x SSC at 60°C for 10 min, 4x SSC at room 
temperature. Finally, slides were mounted in Vectashield 
(Vector Laboratories) with DAPI. Fluorescence was visual- 
ized using appropriate filters on a Zeiss Axioplan 2 micro- 
scope with a 100x objective. Digital images were captured 
using a Photometrics CoolSnap CCD camera (Roper Sci- 
entific) and processed with MetaView (Universal Imaging). 
Z-Stack image acquisitions (15 planes of 0.25 jm) were sys- 
tematically performed for each cell analyzed using a Piezo 
controller, allowing to view the nucleus in all planes and to 
count the total number of labelled chromosomes. Around 
200 cells [187-228] were analyzed per chromosome. 


Bulk genome sequencing 


Genomic DNA from the BPK282 cl4 and BPK0O81 cl8 
clones was extracted in bulk using the QlAmp™ DNA 
Mini kit (Qiagen) following manufacturer’s recom- 
mendations. PCR-free whole genome sequencing was 
performed on the Illumina NovaSeq platform using 
2 x 150 bp paired reads. Reads were mapped to the 
reference genome L. donovani LdBPKv2 (available at 
ftp://ftp.sanger.ac.uk/pub/project/pathogens/Leishmania/ 

donovani/LdBPKPAC2016beta/) using BWA _ (version 
0.7.17) with seed length set to 100 (36). Only properly 
paired reads with a mapping quality higher than 30 were 
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selected using SAMtools (37). Duplicates reads were 
removed using the RemoveDuplicates command in the 
Picard software (http://broadinstitute.github.1o/picard/). 
The average somy values were calculated as described 
previously (1), by dividing the median sequencing depth 
of a chromosome by the overall median sequencing depth 
over all chromosomes, and multiplying this ratio by 2. 
These values were used to define an average karyotype for 
the sequenced population of cells. 


Gene Ontology analysis and in silico screening for small RNA 


Gene Ontology (GO) classes were obtained from TriT- 
rypDB release 49 (38). As the genome sequence stored on 
TriTrypDB does not correspond to the reference genome 
used in this work, the GO annotation was obtained by map- 
ping back all genes to our reference genome using BlastP 
(39). Clustering of the different chromosomes based on 
their assigned GO classes was performed using the prcomp 
command in R. 

Two different data sources were used for non-coding 
RNA screening. First, the Rfam (40) database version 
14.4 was used to screen the L. donovani LdBPKv?2 refer- 
ence genome using the cmscan algorithm as implemented 
in Infernal (41) using default parameters and setting the 
search space parameter to 64. Second, the non-coding 
RNA nucleotide sequences as annotated in the L. dono- 
vani BPK282A1 genome on TriTrypDB version 54 were ex- 
tracted, and aligned versus the reference genome used in 
this work (L. donovani LAGBPKv2). Next, both data sources 
were compared with each other using BLAST, and redun- 
dant copies were removed (i.e., only one copy retained to 
avoid duplicates). 


RESULTS 


High throughput single-cell genome sequencing as a reliable 
tool to explore karyotype heterogeneity in Leishmania popu- 
lations 


We applied high throughput single-cell genome sequenc- 
ing (SCGS) to address mosaic aneuploidy in promastig- 
otes of two Leishmania clones differing substantially in av- 
erage aneuploidy as revealed by bulk genome sequencing: 
(1) BPK282 cl4, an aneuploid clone showing seven chromo- 
somes with an average trisomy apart from the usual average 
tetrasomy in chr31 and (11) BPK081 cl8, showing an aver- 
age disomy for all chromosomes except chr31 (average tetra- 
somic); for simplicity, we will call BPK081 cl8 the ‘diploid’ 
clone. Details about sequencing statistics are provided in 
Supplementary Table S1. First analyses of the SCGS data 
were made with the Cell Ranger DNA™ pipeline. Although 
the software was developed for mammalian genomes, which 
are up to 2 orders of magnitude larger than Leishmania’s 
nuclear genome, it allowed detecting (1) aneuploidy, (11) mo- 
saicism and (111) large intrachromosomal CNVs, as, for in- 
stance, the H- and M- amplicons (1) in chr23 and chr36 
respectively (Supplementary Figure S2). However, techni- 
cal artifacts were noticed especially in BPK081 cl8, where 
the software’s GC bias correction algorithm, designed for 
the mammalian genome which display a lower average GC 
(41%) content compared to Leishmania (60%), ended up 


overcompensating the depth of bins with high GC con- 
tent (Supplementary Figure S2). Because of that and given 
our main goal of using SCGS to study mosaic aneuploidy, 
we built our own analytical bioinformatic pipeline with a 
higher emphasis on estimating whole chromosomes copy 
numbers rather than local CNVs (Supplementary Figure 
S3). 

We evaluated the SCGS method and our analytical 
pipeline by first addressing their ability to explore karyotype 
heterogeneity among Leishmania cells of clones BPK 282 cl4 
and BPK081 cl8. Using our analytical pipeline, we identi- 
fied 208 different karyotypes among the 1516 filtered cells 
of BPK282 cl4 and 117 karyotypes among the 2378 filtered 
cells of BPK081 cl8 (Figure 1A, B, Supplementary Figure 
S4A, B). Moreover, the cumulative SCGS profile of each 
clone was consistent with their respective average aneu- 
ploidy profile (Figure 1A and B, left panel). Notably, chr13, 
which displays a non-integer average somy value (2.26) in 
BPK282 cl4, was found as disomic and trisomic at rela- 
tively high proportions in the SCGS, resulting in a simi- 
lar cumulative somy (2.34). As expected, the vast majority 
of cells in BPKO081 cl8 displayed an almost diploid kary- 
otype, with only chr31 showing a tetrasomy as expected. 
Small subpopulations of cells displaying highly aneuploid 
karyotypes were also observed in BPK081 cl8 (discussed 
below). 

Mosaic aneuploidy in Leishmania has been studied so 
far with fluorescence in situ hybridization (FISH), the only 
alternative method available hitherto to estimate the copy 
number of some chromosomes in individual Leishmania 
cells. As a mutual benchmark of both FISH and SCGS 
methods, we submitted cells from both BPK282 cl4 and 
BPK081 cl8 to FISH to estimate the copy number of chro- 
mosomes chr], chr5 and chr22 and to compare the obtained 
results with the values observed in our SCGS data (Figure 
1C). Overall, for each chromosome, the same predominant 
somy was observed with both methods, even when the pre- 
dominant somy was different between clones. For instance, 
FISH and SCGS report chr5 in BPK 282 cl4 as trisomic in 
most cells, while it is reported as mainly disomic in BPK081 
cl8 also by both techniques. Most discrepancies between the 
proportions obtained by both methods are within the 10% 
error margin previously estimated for FISH (3 and unpub- 
lished results). The main exception is chr5 in BPK282 cl4, 
which 1s estimated as trisomic in 93% of the cells with SCGS 
and 66% with FISH. However, SCGS reports proportions 
which are more consistent with the average somy values ob- 
served in the bulk analysis of each clone. For instance, the 
weighted mean between somy values obtained with SCGS 
for chr5 in BPK282 cl4 results in an average somy of 2.95, 
which is very similar to the average somy value observed in 
bulk (2.97), whereas with FISH, the average somy is lower 
(2.66), suggesting that the proportions observed with SCGS 
are more accurate. 

In order to test whether SCGS did not underestimate 
mosaicism, because of potential biases in the amplifica- 
tion of some chromosomes, we chose 4 strains character- 
ized by a high average aneuploidy—previously assessed by 
bulk genome sequencing (29)—and mixed them together 
to create a “super-mosaic’ population which was submitted 
to a single SCGS run. A total of 1900 promastigotes were 
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Figure 1. Mosaic aneuploidy in BPK 282 cl4 and BPK081 cl8 clones revealed by SCGS and validation of the method. (A, B) Heat maps displaying the copy 
number of all 36 chromosomes of promastigotes from BPK282 cl4 (A) or BPK081 cl8 (B) clones (main panels). Each column represents a single parasite. 
The number of sequenced promastigotes and karyotypes found in each sample is described in the x axis. In each panel, two insets display the average 
aneuploidy profile (bulk) of the clonal population used in the SCGS and their respective parental strain. (C) Comparison between FISH and SCGS. The 
proportion of cells displaying monosomy, disomy or trisomy for chromosomes 1, 5 and 22 in each method is represented. Around 200 cells [187—228] were 
analyzed per chromosome in FISH. (D) Heat map displaying the karyotypes of the promastigotes from four different strains mixed in a single SCGS run. 
Cells were hierarchically clustered according to their karyotypes, forming 4 major clusters. The number of cells in each cluster is indicated in the x axis. 
The bar at the top of the heatmap indicate if the SNP profile of the cell corresponds to a BPK strain (black), a HU3 strain (orange) or a doublet (purple). 


individually sequenced, of which, 1636 remained after data 
filtering. This ‘super mosaic’ population displayed a high 
aneuploidy diversity: 388 identified karyotypes in total. As 
expected, the 1636 promastigotes formed four distinct clus- 
ters based on their integer somy values, with discrete differ- 
ences in the aneuploidy patterns between each cluster (Fig- 
ure 1D). Since one of the strains (HU3) used in this su- 
per mosaic is phylogenetically distant from the other three 
strains (BPK475, BPK498 and BPK506), we could distin- 
guish HU3 promastigotes from the others based on their 
SNP profiles. Interestingly, all HU3 cells were grouped to- 
gether in cluster C (Figure 1D—orange lines in the anno- 
tation bar), suggesting that the discrete karyotypic differ- 
ences between the major clusters reflect differences among 
the aneuploidy profiles of the four strains, so that each clus- 
ter likely represents one of the strains. Thus, this experiment 
demonstrates that SCGS is effective in distinguishing kary- 
otypes even in very complex populations. 


BPK282 cl4 and BPK081 cl8 cells display different patterns 
of karyotype evolution during clonal expansion 


After validating the SCGS method for resolving complex 
karyotype heterogeneity in Leishmania, we returned to the 
data of BPK282 cl4 and BPKO81 cl8 to characterize the 
karyotypes that are present in each clone. In BPK282 cl4, 
the most frequent karyotypes were very similar to each 
other, diverging by copy number changes in one to three 
chromosomes when compared to the most frequent kary- 
otype (karl—Figure 2A). In BPK0O81 cl8, however, the 
nearly diploid karl, which was present in 82% of the cells, 
and the two next most abundant karyotypes showed very 
different aneuploidy profiles, diverging by copy numbers 
of 8 to 10 chromosomes (Figure 2B). In addition, in both 
clones, the most frequent karyotype (kar1) is similar to the 
average aneuploidy profile of the respective parent strain 
from which each clone was derived (Figure 1A, B, left 
panel), suggesting that, in each clone, karl corresponds to 


ccoc Aeniqa4 G| uo ysan6 Aq ¢g06S79/E6Z/1/0G/a|0lWe/Jeu/Woodno‘olwapede//:sd}Yy Wo papeojuMoq 


298 Nucleic Acids Research, 2022, Vol. 50, No. 1 


BPK282 cl4 


5237 30272015128 7666555555444444433333 3333333 





OONBDNARWNM— 
1_i{_j{_j_j_j_} 
OONDOTARWOM— 
a 


Copy 
Number 


1 


Chromosome 





ak oP 








123 45 6 7 8 9 101112131415 1617 1819 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 


Karyotype 


BPK282 cl4 


Nodes: 


@ Karyotype found in more than 1 cell 
© Karyotype found in only 1 cell 


BPK081 cl8 


Os seen eect unsees cone taste ae ese 





Chromosome 











12345 6 7 8 9 101112131415 1617 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 


Karyotype 


BPK081 cl8 





Edges: 


1 chromosome with different somy 





2 or more (N) chromosomes with different somy 





seeenet Alternative single-step links 


Figure 2. BPK282 cl4 and BPK081 cl8 display different profiles in the dissimilarity relationship between karyotypes. (A, B) Heat map depicting the 40 most 
frequent karyotypes in BPK 282 cl4 (A) and BPK081 cl8 (B) clones. The bars at the top indicate how many cells are found with each depicted karyotype. (C) 
Network representing the dissimilarity relationship between karyotypes in each clone. Black nodes represent karyotypes found in more than one cell, with 
their size proportional to the number of cells. Green nodes indicate karyotypes which occur only once. Black lines link two karyotypes which diverge by a 
somy difference in a single chromosome, while orange lines link karyotypes diverging by two or more chromosomes with different somy, with the number 
of divergent chromosomes indicated in the edge. Dashed grey lines show alternative links between karyotypes with a single somy divergency. Polyploid 


karyotypes were not included in the networks. 


the karyotype of the founder cell, and thus, the other kary- 
otypes of each population arose from their respective kar]. 

To develop a hypothesis of the karyotype evolution dur- 
ing expansion of both BPK282 cl4 and BPK081 cl8 popu- 
lations, we built a dissimilarity network based on the num- 
ber of chromosomes with different copy numbers between 
each karyotype found in each population (Figure 2C). Both 


populations of cells are at different stages of expansion 
(about 126 and 56 generations since cloning, respectively), 
but we observe in each of them a proportionally compara- 
ble number of somy changes events (steps in the network): 
(i) for BPK282 cl4, 514 steps/126 generations/1516 se- 
quenced cells = 0.0027 and (11) for BPK081cl8, 260 steps/56 
generations/2378 sequenced cells = 0.002. However, dis- 
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tinct patterns are observed between both clones. In BPK282 
cl4, the most frequent karyotypes (black nodes) are linked 
to each other by somy changes in only single chromosomes 
(black lines). Assuming karl as the founder of this popu- 
lation, almost every frequent karyotype can be traced back 
to it through cumulative single copy number alterations. In 
contrast, the network of BPK081 cl8 shows a very distinct 
pattern (Figure 2C). Here, the 3 most frequent karyotypes 
are distant from one another and lack single-step interme- 
diates between them. 


High frequencies of polysomies are restricted to a specific 
eroup of chromosomes 


We and others have demonstrated that high frequencies of 
polysomies were restricted to a specific subset of chromo- 
somes when comparing the average aneuploidy of 204 L. 
donovani strains previously analyzed in bulk (23,29). To ad- 
dress if the same applies to single Leishmania cells, we cre- 
ated a diverse artificial population by randomly selecting 
and merging the data of equal numbers of single cells from 
BPK282 cl4 and BPK081 cl8 as well as from each cluster 
of the super mosaic population, assuming each cluster rep- 
resents one of the mixed strains. In this artificial popula- 
tion, we observed that at least 15 chromosomes are consis- 
tently disomic in the vast majority of cells in a clone/strain- 
independent manner (Figure 3A). All these chromosomes 
also show an average disomy in most of the 204 strains 
mentioned above (Supplementary Figure S5A, B). These 
chromosomes are referred here as ‘mainly disomic’. Con- 
versely, apart from the usually tetrasomic chr31, 8 chro- 
mosomes (chr5, chr8, chr9, chr13, chr20, chr23, chr26 and 
chr33) are found with 3 or more copies in most cells, again 
fitting with previous observations made on the 204 L. dono- 
vani strains (23,29). We call these chromosomes as “mainly 
polysomic’. A Spearman correlation test highlighted poten- 
tial synchronies between the copy numbers of these chro- 
mosomes, with the strongest correlations being observed 
between chr20 and chr8, and between chr5 and chr9 (Fig- 
ure 3B, Supplementary Figure S5C). There is also an as- 
sociation between the somies of chr3]1 and chr15, which 
is also noticeable in Figure 3A, where when the somy of 
chr31 increases to 5, the somy of chr15 also increases to 
three or even four copies, suggesting a potential dosage in- 
terplay between these chromosomes. Strong dosage corre- 
lations are also observed between chromosomes chr6 and 
chr7, chr10 and chr14, chr15 and chr28, chrl6 and chr35, 
and between chromosomes chr28 and chr32. These chro- 
mosomes are usually found as disomic but still display a 
polysomy in a relatively large (>5%) subgroup of cells. We 
refer to these chromosomes as ‘intermediate’. 

It is unclear whether (1) the disparity in the frequency of 
polysomies between chromosomes is due to intrinsic differ- 
ences in the chances of overamplification of each chromo- 
some along the expansion of the population (some chro- 
mosomes being specifically unstable) or (11) 1f every chro- 
mosome has the potential to have its somy altered but the 
expansion of polysomies in a population is limited by se- 
lective pressures. Thus, we revisited the karyotype network 
of each population, including the ‘super mosaic’ (Supple- 
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mentary Figure S5D), to investigate differences in dosage 
stability between chromosomes. In this analysis, we deter- 
mined which were the chromosomes that were different be- 
tween a karyotype and each other karyotype that was con- 
nected to it in the network. Thus, unstable chromosomes are 
expected to change more frequently between karyotypes. 
Connections between two karyotypes found in more than 
1 cell were named as ‘common karyotypes’ while connec- 
tions where at least one of the karyotypes was found in only 
1 cell were marked as ‘rare karyotypes’ (Figure 3C). As ex- 
pected, the 15 mainly disomic chromosomes display little, 
if any, alteration events in their copy numbers among the 
common karyotypes in all three samples (Figure 3D). How- 
ever, between the rare karyotypes, all chromosomes are sus- 
ceptible to somy alterations, although the mainly polysomic 
chromosomes still display a higher alteration frequency (P- 
value < 0.0001 — Supplementary Figure S5E). Interestingly, 
the mainly polysomic chromosomes chr20 and chr26 are 
among the most stable chromosomes between the common 
karyotypes but are still highly variable among the rare kary- 
otypes. A similar trend is also observed for chromosomes 
chr5, chr23 and chr31, which could be an indicative of se- 
lective pressure limiting dosage changes in these chromo- 
somes. 

In order to investigate potential features specific to the 
mainly polysomic chromosomes that could be related to 
their higher frequency of polysomies, we performed a se- 
ries of in silico enrichment analysis. Gene Ontology (GO) 
analysis did not reveal any obvious relationship between 
gene content and the prevalence of polysomies (Supple- 
mentary Figure S6). As the first analysis took into account 
only protein-coding genes, we turned our attention to non- 
coding RNAs, which could be relevant in this context, given 
the known high level of post-transcriptional regulation in 
Leishmania. Indeed, an in silico scan for non-coding RNAs 
(rRNA, tRNA, snoRNA and others) suggested an enrich- 
ment of small RNAs in some of the mainly polysomic 
chromosomes (Figure 3E), and especially for small nucle- 
olar RNAs (snoRNAs), a statistical significant enrichment 
could be detected on mainly polysomic chromosomes (P- 
value < 0.001, Fisher’s exact test). A significant number of 
hits for snoRNAs are mapped to chr5, chr26 and chr33, 
which are among the chromosomes with the most frequent 
polysomies, as well as chr35, which is trisomic in the ma- 
jority of BPK282 cl4 cells and is also trisomic in the aver- 
age aneuploidy of several L. donovani strains (29). Although 
preliminary, this observation suggests a potential relation- 
ship between the snoRNAs content of a chromosome and 
its prevalence of polysomies in cultivated promastigotes. 


SCGS reveals particular karyotypes among rare single cells 


As shown above, kar2 and kar3 of BPK081 cl8 show a base- 
line diploidy, 1.e. the majority of chromosomes are disomic, 
with 8—10 trisomic chromosomes and tetrasomy or even a 
pentasomy for chr31. However, we found in the same pop- 
ulation 5 cells with a karyotype similar to kar2 and kar3, 
but in which the scaling algorithm (see supplementary text 
for details on how cells ploidies are determined) defined the 
baseline ploidy of their karyotypes as triploid (kar 13 and 
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Figure 3. High frequencies of polysomies are restricted to a specific subset of chromosomes. (A) Heat map depicting the copy number of the 36 chromosomes 
across promastigotes from different clones/strains. Here, 251 promastigotes of each cluster of the mixed sample and from BPK 282 cl4 and BPK081 cl8 are 
represented. Chromosomes are hierarchically clustered based on their somy values. (B) Chord diagrams representing the Spearman correlation between the 
copy number of all chromosomes. Only correlations higher than 0.4 and with p-value lower than 0.05 are represented. (C) Ilustration of the analysis done 
based on the karyotype networks of BPK282 cl4, BPK081 cl8 and the super mosaic population in order to quantify changes in somy for each chromosome 
across different karyotypes. In this image, karl and kar2 are found in more than two cells (black nodes), so they are considered common karyotypes. Here, 
chr13 is the only chromosome that displays a different somy between them, so this is considered as a somy change event for chr13. A second karyotype 
differs from karl by a change in somy in chr21. As this karyotype is found only in one cell, it is considered a rare karyotype. (D) Graph indicating the 
number of somy change events for each chromosome among the common karyotypes (found in two or more cells — top panel) or the rare karyotypes (found 
in only one cell - bottom) in the 3 samples submitted to SCGS. Chromosomes are divided in three groups: mainly disomic (found as disomic 1n more than 
95% of the cells), intermediate (found as polysomic in more than 5% of the cells but less than 50%), and the mainly polysomic (found as polysomic in more 
than 50% of the cells). (E) Distribution of non-coding RNAs across L. donovani genome. Ribosomal RNAs (rRNA), small nucleolar RNAs (snoRNAs) 
and transporter RNAs (tRNAs) were identified based on the Rfam and TrypDB database. 


kar 35 in Figure 2B, Supplementary Figure S7). These cells 
also display a higher depth (0.49 x on average) compared to 
the other cells considered diploid (0.29 x on average), indi- 
cating they had a higher DNA content. Although it is not 
possible to rule out that these cells are in fact doublets, it 
is expected that a doublet between two diploid cells would 
yield either a diploid profile, as the ones identified in the 
doublets from the super-mosaic population (Supplemen- 
tary Figure S8) or a tetraploid profile, similar to the ones 
observed in kar30 and kar32 for instance. However, scal- 
ing these cells to 2” or 4n yields raw somy values which are 
further away from integers than when they are scaled to 3n 
(Supplementary Figure S3C), indicating that the ratios be- 
tween the depths of the chromosomes of these cells are more 
compatible with a 3n profile. Noteworthy, tetraploid kary- 
otypes were not found in BPK282 cl4 and the only three 
cells identified with a potential baseline triploidy exhibited 


an aneuploidy pattern very distinct from any other kary- 
otype in that population (Supplementary Figure S7). 

Moreover, within the BPK 282 cl4 and BPK081 cl8 popu- 
lations, we also observed rare cells displaying chromosomes 
with an estimated somy of 0 (nullisomy). The bam file of 
these cells showed that no reads were mapping to these chro- 
mosomes in sharp contrast with other chromosomes of the 
same cell, suggesting that in these cells, these chromosomes 
were absent (Figure 4A). Nullisomic chromosomes were 
found in all the populations sequenced here: among which, 
4 in BPK081 cl8 (0,15% of the sequenced cells) and 15 from 
BPK282 cl4 (0,88%). Moreover, the aneuploidy profile of 
these nullisomic cells was not similar to any other karyotype 
identified in each sample (Figure 4B). Partial chromosome 
deletions were also observed, as for instance in chr1l3 and 
chr36 of the cell 688 from BPK282 cl4, in the chr36 of the 
cell 266 from BPK081 cl8. 
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Figure 4. Cells with nullisomic chromosomes. (A) Example of cells displaying one or more nullisomic chromosomes. The dots represent the normalized 
read depth of each 20 kb bin. The integer somy values calculated for each cell are depicted in the top part of each box for cells that were not excluded 
from analysis. A black line shows the integer somy values divided by the cell’s scale factor (Sc) for comparison. (B) Karyotype of all cells with at least one 
nullisomic chromosome identified in our SCGS data. Cells that were removed from analysis and therefore did not have their somy values converted to 
integers are separated in the right panel, displaying their raw somy values instead. 


DISCUSSION 


Cellular heterogeneity is increasingly implicated as one of 
the major sources of adaptative potential for unicellular 
pathogens (42,43). We explored here a specific manifesta- 
tion of this phenomenon, 1.e. mosaic aneuploidy, in a unique 
model, Leishmania. By applying a high throughput SCGS 
method, we could determine for the first time the complete 
karyotype of thousands of individual Leishmania cells from 
two distinct clonal populations in vitro. We found a high 
level of mosaic aneuploidy, affecting essentially the same, 
limited subset of chromosomes. We explored the evolution 
of mosaicism in both populations, starting from two distinct 


founder karyotypes, one nearly euploid and another highly 
aneuploid. 

The present SCGS study allowed us to evaluate and ex- 
tend hypotheses on mosaic aneuploidy in Leishmania pre- 
viously based on FISH measurements (3,26). Although 
some divergencies were observed here between FISH and 
SCGS, our data are in agreement with most predictions. 
Accordingly, mosaic aneuploidy was confirmed in all pop- 
ulations sequenced here, and karyotypes frequency distri- 
butions, in particular for BPK282 cl4 clone (208 kary- 
otypes among 1516 cells), were similar to the distribu- 
tion predicted with FISH data obtained for seven chro- 
mosomes of a long-term cultivated Leishmania major 
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population (~250 karyotypes in ~2000 cells—Sterkers et 
al, 2012 Figure 4). Similarly, the observation of nullisomic 
chromosomes by SCGS supported previous FISH analy- 
sis of dividing nuclei: this showed that for both chromo- 
somes 2 and 22 of L. major, around 1% of the evaluated 
parasites were displaying a ‘1 + 0’ distribution of chro- 
mosomes between daughter cells (3). In BPK081 cl8, pro- 
portionally fewer karyotypes were identified compared to 
BPK282 cl4, which might be a consequence of either a re- 
duced tendency of the founder diploid karyotype to somy 
alterations and/or due to the fact this clone was at an ear- 
lier stage of expansion in vitro (~56 generations, compared 
to the ~126 generations in BPK282). Indeed, when nor- 
malizing the number of karyotypes, similar values were ob- 
served for both clones: respectively 10exp* and 9exp* new 
karyotypes/generation/sequenced cell. 

Our SCGS data, however, do not corroborate the previ- 
ous assumptions that all chromosomes are found with at 
least two somy states (3,26), as high levels of somy vari- 
ation were restricted to a subset of chromosomes in our 
experimental conditions. Noteworthy, this feature could be 
species-specific as the analysis of single L. braziliensis cells 
in similar conditions of in vitro maintenance showed a gen- 
eralized trisomy of chromosomes chrll and chr25 (27), 
while these two chromosomes were disomic in the majority 
of L. donovani cells here studied. We also observed a higher 
tendency of FISH to report trisomies and monosomies in 
chromosomes which were defined by SCGS as mostly di- 
somic in almost all cells of BPK282 cl4 and BPK081 cl8 
clones, as chr01 and chr22. This discrepancy 1s likely due to 
accuracy limitations in FISH (44,45). 

The SCGS data reported here also allowed us to draw 
some hypothesis regarding the origin and evolution of mo- 
saic aneuploidy in vitro. We have previously demonstrated 
that intracellular amastigotes sequenced directly from pa- 
tient samples usually display a diploid average aneuploidy 
similar to the profile of the BPK081 cl8 clone, although 
variations in somies were observed in some samples (46). 
However, when these amastigotes were isolated from pa- 
tients or experimental animals and transformed to pro- 
mastigotes in vitro, in most cases they progressively evolve 
towards highly aneuploid profiles (19,46,47). Thus, the 2 
clones here studied provide complementary models to un- 
derstand the dynamics of the emergence of mosaic aneu- 
ploidy in vitro; BPKO81 cl8 which founder karyotype had 
the diploid profile, representing an early stage of expansion 
in culture; and BPK282 cl4, which founder karyotype was 
already highly aneuploid (likely karl), representing later 
Stages. 

In the BPK081 cl8, a minority of highly aneuploid sub- 
populations were observed, contrasting with the founder 
diploid karyotype (kar1), indicating that at early stages of 
clonal expansion in culture, the evolution of mosaicism 
starts with drastic changes in karyotypes, in this case the 
observed changes in somy of 8 to 10 chromosomes lead- 
ing to highly aneuploid cells (kar2 and kar3). These dras- 
tic changes in somies could occur through cumulative small 
steps, 1.e., somy alterations in single chromosomes at each 
cell division, followed by fixation and further expansion 
of the fittest aneuploidies and loss of intermediate links 


between these karyotypes during clonal evolution. Alter- 
natively, kar2 and kar3 in BPKO81 cl8 may have origi- 
nated independently from karl by simultaneous amplifi- 
cations of multiple chromosomes. However, the presence 
of potentially triploid cells which resemble kar2 and kar3 
opens other possibilities. On one hand, polyploidization has 
been demonstrated as an important mechanism in yeasts 
for quickly generating multiple and highly discrepant aneu- 
ploid karyotypes from a single parent through assorted mis- 
segregation of chromosomes during downstream cell divi- 
sions (9). In case a similar mechanism occurs in Leishmania, 
these 3n karyotypes found in BPK081 cl8 could represent 
an intermediate step between whole genome polyploidiza- 
tion event and reversion to aneuploid kar2 and kar3. On 
the other hand, 3n karyotypes could be reminiscent of hy- 
bridization, which was recently shown to occur in vitro (48); 
the common observation of 3n karyotypes in Leishmania af- 
ter hybridization in sand flies supports this hypothesis (49— 
52). Further work is required in order to test these hypoth- 
esis. 

Surrounding the three major karyotypes in the network 
of BPK081 cl8, other minor karyotypes with single somy 
alterations are observed, suggesting that once a successful 
karyotype expands, small variations of it are continuously 
generated by small changes in somies. This pattern is more 
evident in the karyotype network of BPK282 cl4, where al- 
most all karyotypes which are found in at least two cells are 
at one somy change distance from another karyotype, sug- 
gesting that these karyotypes were also continuously gen- 
erated by cumulative steps of small somy alterations. Ac- 
cordingly, the founder karyotype of this clone (likely kar1) 
was already highly aneuploid and well adapted to culture, 
as the parent population from which BPK282 cl4 was 1so- 
lated was already in culture for 21 passages (Supplementary 
Figure S1). 

Karyotypes displaying several polysomies are observed 
in most in vitro cultured Leishmania promastigotes anal- 
ysed so far in bulk (29,53,54). This usually affects a spe- 
cific group of chromosomes, largely overlapping with the 8 
mainly polysomic chromosomes described here. The early 
amplifications reproducibly observed in the average aneu- 
ploidy profile of parasite populations in transition from in 
vivo to in vitro (46,47) suggest an adaptative role for specific 
polysomies in adaptation to culture. However, the mecha- 
nisms that determine which chromosomes are amplified are 
still poorly understood. 

By investigating which chromosomes were more prone 
to somy alterations in rare and common karyotypes, 
we gathered evidence suggesting that all chromosomes 
can be stochastically amplified during population expan- 
sion, potentially at different rates, but we hypothesize 
that selective forces likely dictate the higher frequency of 
polysomies observed in some chromosomes. Changes in 
the average chromosome copy numbers of cell popula- 
tions are directly reflected in the average amount of tran- 
scripts encoded by the genes present on these chromo- 
somes (19,23) and to a certain degree also affect the aver- 
age amount of certain proteins (24). Consequently, aneu- 
ploidy might lead to dosage imbalances between the prod- 
uct of genes located in chromosomes that display different 
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somies. The frequently observed co-modulation of multiple 
chromosomes—estimated with Spearman correlations with 
single-cell resolution here and across 204 L. donovani 1so- 
lates, in bulk, as previously described (23)—might reflect a 
dynamic compensation mechanism that reduces these 1m- 
balances and at the same time increases the dosage of key 
genes. Our GO analyses did not reveal any enrichment of 
biological functions in the (co-) amplified chromosomes. 
However, we observed an enrichment of snoRNA genes 
in some of the mainly polysomic and intermediate chro- 
mosomes, accordingly chr05, chr26, chr33 and chr35. This 
class of small RNAs is involved in the extensive process- 
ing of ribosomal RNA (rRNA) characteristic of trypanoso- 
matids, directly affecting ribosomal biosynthesis and ulti- 
mately translation, both increased in cultured promastig- 
otes (55,56). Amplification of these chromosomes as seen 
in many cells in vitro might ultimately boost the translation 
capacity of the cells due to a consequent higher abundance 
of snoRNAs. At the time of submission of the present ar- 
ticle, a pre-print of a study was published where the role 
of posttranscriptional regulation in Leishmania fitness gain 
was examined (57). In this manuscript, snoR NAs are postu- 
lated to play a key role in differential regulation of mRNA 
stability through change in the composition of ribosomes. 
Specifically, differential expression of snoR NAs was shown 
to be correlated with changes in rRNAs epigenetic mod- 
ifications, which were proposed to result in formation of 
fitness-adapted ribosomes. Consequently, by regulating the 
composition of ribosomes, the expression of beneficial and 
deleterious gene dosage effects could be regulated through 
translational control. Amplification of chromosomes carry- 
ing these snoR NAs as seen in many cells in vitro might ulti- 
mately alter and adapt the translational capacity of the cells 
due to a consequent higher abundance of snoRNAs as pos- 
tulated by Spaeth et al. 

The high diversity of karyotypes identified in both mod- 
els here described is in agreement with the idea of mo- 
saic aneuploidy being a constitutive feature in Leishmania 
(25). Knowing that average aneuploidy (as defined by bulk 
genome sequencing) is much lower and different in vivo than 
in vitro (19), further work would be required to compare 
the extent and nature of mosaicism in vitro and in vivo. The 
generation of karyotypic heterogeneity represents a source 
of functional diversity, due to variations in genes dosage 
(19), and it is also expected to facilitate the removal of detri- 
mental mutations and the fixation of beneficial haplotypes 
(23,26). Although in a given environment some very dif- 
ferent karyotypes might be limited to low frequencies, they 
may provide to the population a major (pre-)adaptation po- 
tential to unpredictable environmental changes, such as a 
change of host or drug pressure associated to chemother- 
apy (19,21,22,58). Time-lapse SCGS studies of populations 
of parasites during clonal expansion under stable or vary- 
ing environments are needed to monitor the dynamics of 
mosaicism and test this pre-adaptation hypothesis. Com- 
bining SCGS with single-cell transcriptomics could also al- 
low to understand better the impact of gene dosage imbal- 
ance on transcription with a single-cell resolution. Thus, 
high throughput single-cell sequencing methods represent 
a remarkable tool to understand key aspects of Leishmania 
biology and adaptability. 
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